在这项研究中,我们利用高斯工艺,概率神经网络,自然梯度增强和分位数回归增强梯度的增强,以模拟激光制造过程的交付时间。我们在域中介绍概率建模,并根据不同能力比较模型。在现实生活数据中的模型之间进行比较,我们的工作具有许多用例和实质性业务价值。我们的结果表明,所有模型都超过了使用域经验的公司估计基准,并具有良好的经验频率校准。
translated by 谷歌翻译
We present a novel image inversion framework and a training pipeline to achieve high-fidelity image inversion with high-quality attribute editing. Inverting real images into StyleGAN's latent space is an extensively studied problem, yet the trade-off between the image reconstruction fidelity and image editing quality remains an open challenge. The low-rate latent spaces are limited in their expressiveness power for high-fidelity reconstruction. On the other hand, high-rate latent spaces result in degradation in editing quality. In this work, to achieve high-fidelity inversion, we learn residual features in higher latent codes that lower latent codes were not able to encode. This enables preserving image details in reconstruction. To achieve high-quality editing, we learn how to transform the residual features for adapting to manipulations in latent codes. We train the framework to extract residual features and transform them via a novel architecture pipeline and cycle consistency losses. We run extensive experiments and compare our method with state-of-the-art inversion methods. Qualitative metrics and visual comparisons show significant improvements. Code: https://github.com/hamzapehlivan/StyleRes
translated by 谷歌翻译
Artificial Intelligence (AI) and its applications have sparked extraordinary interest in recent years. This achievement can be ascribed in part to advances in AI subfields including Machine Learning (ML), Computer Vision (CV), and Natural Language Processing (NLP). Deep learning, a sub-field of machine learning that employs artificial neural network concepts, has enabled the most rapid growth in these domains. The integration of vision and language has sparked a lot of attention as a result of this. The tasks have been created in such a way that they properly exemplify the concepts of deep learning. In this review paper, we provide a thorough and an extensive review of the state of the arts approaches, key models design principles and discuss existing datasets, methods, their problem formulation and evaluation measures for VQA and Visual reasoning tasks to understand vision and language representation learning. We also present some potential future paths in this field of research, with the hope that our study may generate new ideas and novel approaches to handle existing difficulties and develop new applications.
translated by 谷歌翻译
Generic motion understanding from video involves not only tracking objects, but also perceiving how their surfaces deform and move. This information is useful to make inferences about 3D shape, physical properties and object interactions. While the problem of tracking arbitrary physical points on surfaces over longer video clips has received some attention, no dataset or benchmark for evaluation existed, until now. In this paper, we first formalize the problem, naming it tracking any point (TAP). We introduce a companion benchmark, TAP-Vid, which is composed of both real-world videos with accurate human annotations of point tracks, and synthetic videos with perfect ground-truth point tracks. Central to the construction of our benchmark is a novel semi-automatic crowdsourced pipeline which uses optical flow estimates to compensate for easier, short-term motion like camera shake, allowing annotators to focus on harder sections of video. We validate our pipeline on synthetic data and propose a simple end-to-end point tracking model TAP-Net, showing that it outperforms all prior methods on our benchmark when trained on synthetic data.
translated by 谷歌翻译
目的:对象检测正在通过自动化系统中的机器学习技术迅速发展。准备好的数据对于训练算法是必要的。因此,本文的目的是描述上下文(Loco)数据集中所谓的物流对象的重新评估,该数据集是内部径流学领域中的第一个用于对象检测的数据集。方法论:我们使用三个步骤的实验研究方法来评估机车数据集。首先,分析了GITHUB上的图像以更好地了解数据集。其次,Google Drive Cloud用于培训目的,以重新访问算法实现和培训。最后,如果可以与原始出版物相比,可以检查机车数据集,如果可以实现相同的培训结果。研究结果:在我们的研究中实现的平均平均精度是对象检测中的常见基准,比最初的研究作者的初步研究显着增加,获得了41%的幅度。但是,在叉车和托盘卡车的物体类型中特别看到改进潜力。独创性:本文介绍了Loco数据集的首次关键复制研究,以用于内凝学中的对象检测。它表明,基于机车的更好参数的培训甚至比原始出版物中提出的更高的精度。但是,还有进一步改善机车数据集的空间。
translated by 谷歌翻译
通过脑电图信号的情绪分类取得了许多进步。但是,诸如缺乏数据和学习重要特征和模式之类的问题始终是具有在计算和预测准确性方面改进的领域。这项工作分析了基线机器学习分类器在DEAP数据集上的性能以及一种表格学习方法,该方法提供了最新的可比结果,从而利用了性能提升,这是由于其深度学习架构而无需部署重型神经网络。
translated by 谷歌翻译
发现广泛使用的深度学习模型的稳健性差。几乎没有噪音可以欺骗最先进的模型来做出错误的预测。尽管有很多高性能攻击生成方法,但其中大多数直接在原始数据中添加了扰动,并使用L_P规范对其进行测量;这可能会破坏数据的主要结构,从而产生无效的攻击。在本文中,我们提出了一个黑框攻击,该攻击不是修改原始数据,而是修改由自动编码器提取的数据的潜在特征;然后,我们测量语义空间中的噪音以保护数据的语义。我们在MNIST和CIFAR-10数据集上训练了自动编码器,并使用遗传算法发现了最佳的对抗扰动。我们的方法在MNIST和CIFAR-10数据集的前100个数据上获得了100%的攻击成功率,而扰动率较小。
translated by 谷歌翻译
深度学习越来越多地在医疗保健中获得迅速采用,以帮助改善患者的结果。在医学图像分析中,需要进行广泛的培训,以获得必要的专业知识,以成为值得信赖的从业者。但是,尽管深度学习技术继续提供最先进的预测性能,但阻碍医疗保健中这一进展的主要挑战之一是这些模型推理机制的不透明性质。因此,归因在建立对利益相关者的信心中对深度学习模型为临床决策做出的预测的信心至关重要。这项工作试图回答以下问题:深神网络模型在医学图像中学到什么?从这个角度来看,我们使用基于自适应路径的梯度积分技术提出了一个新颖的归因框架。结果表明,通过允许他们了解输入预测相关结构,发现新的生物标志物并揭示潜在的模型偏见来提高领域专家的信任,以改善医疗保健结果。
translated by 谷歌翻译
由于钻孔对准的困难以及任务的固有不稳定性,在手动完成时,在弯曲的表面上钻一个孔很容易失败,可能会对工人造成伤害和疲劳。另一方面,在实际制造环境中充分自动化此类任务可能是不切实际的,因为到达装配线的零件可以具有各种复杂形状,在这些零件上不容易访问钻头位置,从而使自动化路径计划变得困难。在这项工作中,开发并部署了一个具有6个自由度的自适应入学控制器,并部署在Kuka LBR IIWA 7配件上,使操作员能够用一只手舒适地在机器人上安装在机器人上的钻头,并在弯曲的表面上开放孔,并在弯曲的表面上开放孔。通过AR界面提供的玉米饼和视觉指导的触觉指导。接收阻尼的实时适应性在自由空间中驱动机器人时,可以在确保钻孔过程中稳定时提供更高的透明度。用户将钻头足够靠近钻头目标并大致与所需的钻探角度对齐后,触觉指导模块首先对对齐进行微调,然后将用户运动仅限于钻孔轴,然后操作员仅将钻头推动钻头以最小的努力进入工件。进行了两组实验,以定量地研究触觉指导模块的潜在好处(实验I),以及根据参与者的主观意见(实验II),提出的用于实际制造环境的PHRI系统的实际价值。
translated by 谷歌翻译
扫描像素摄像机是一种新型的低成本低功率传感器,不受衍射限制。它作为扫描过程中从场景的各个部分提取的样品序列产生数据。它可以提供非常详细的图像,而牺牲了采样和缓慢的图像获取时间。本文提出了一种新的算法,该算法允许传感器在此序列的过程中调整采样量。这可以通过最大程度地减少图像和传输场景所需的带宽和时间来克服这些限制,同时保持图像质量。我们检查了图像分类和语义分割的应用,与完全采样的输入相比,能够获得相似的结果,而使用样本少80%
translated by 谷歌翻译